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Abstract. Community detection is of great importance for understand¬ 
ing graph structure in social networks. The communities in real-world 
networks are often overlapped, i.e. some nodes may be a member of 
multiple clusters. How to uncover the overlapping communities/clusters 
in a complex network is a general problem in data mining of network data 
sets. In this paper, a novel algorithm to identify overlapping communi¬ 
ties in complex networks by a combination of an evidential modularity 
function, a spectral mapping method and evidential c-means clustering 
is devised. Experimental results indicate that this detection approach 
can take advantage of the theory of belief functions, and preforms good 
both at detecting community structure and determining the appropri¬ 
ate number of clusters. Moreover, the credal partition obtained by the 
proposed method could give us a deeper insight into the graph structure. 

Keywords: Evidential modularity; Evidential c-means; Overlapping com¬ 
munities; Credal partition: 


1 Introduction 

In order to have a better understanding of organizations and functions in the 
real networked system, the community structure, or the clustering in the graph 
is a primary feature that should be taken into consideration [3]. As a result, com¬ 
munity detection, which can extract specific structures from complex networks, 
has attracted considerable attention crossing many areas from physics, biology, 
and economics to sociology [T], where systems are often represented as graphs. 

Generally, a community in a network is a subgraph whose nodes are densely 
connected within itself but sparsely connected with the rest of the network El- 
Many of the community detection approaches are in the frame of probability 
theory, that is to say, one actor in the network can belong to only one commu¬ 
nity of the graph m- However, in real-world networks, each node can fully or 
partially belong to more than one associated community, and thus communities 
often overlap to some extent EE- For instance, in collaboration networks, a 
researcher may be active in many areas but with different levels of commitment, 
and in social networks, an actor usually has connections to several social groups 
like family, friends, and colleagues. In biological networks, a node might have 
multiple functions E- 
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In the last decades, for identifying such clusters that are not necessarily 
disjoint, there is growing interest in overlapping community detection algorithms. 
Zhang et al. m devised a novel algorithm to identify overlapping communities 
in complex networks based on fuzzy c-means (FCM). Nepusz et al. 0 created an 
optimization algorithm for determining the optimal fuzzy membership degrees, 
and a new fuzzified variant of the modularity function is introduced to determine 
the number of communities. Havens et al. EE3 discussed a new formulation of 
a fuzzy validity index and pointed out this modularity measure performs better 
compared with the existing ones. 

As can be seen, most of methods for uncovering the overlapping community 
structure are based on the idea of fuzzy partition, which subsumes crisp partition, 
resulting in greater expressive power of fuzzy community detection compared 
with hard ones. Whereas credal partition [2], which is even more general and 
allows in some cases to gain deeper insight into the structure of the data, it has 
not been applied to community detection. 

In this paper, an algorithm for detecting overlapping community structure 
is proposed based on credal partition. An evidential modular function is intro¬ 
duced to determine the optimal number of communities. Spectral relaxation and 
evidential c-means are conducted to obtain the basic belief assignment (bba) of 
each nodes in the network. The experiments on two well-studied networks show 
that meaningful partitions of the graph could be obtained by the proposed de¬ 
tection approach and it indeed could provide us more informative information 
of the graph structure than the existing methods. 

2 Background 

2.1 Modularity-based community detection 

Let G(H, E, W) be an undirected network, V is the set of n nodes, E is the set of 
m edges, and W is a nxn edge weight matrix with elements Wij,i , j = 1, 2, • • • , n. 
The objective of the hard (crisp) community detection is to divide graph G into 
c clusters, denoted by 

17 = {wi, w 2) • • • ,w c }, (1) 

and each node should belong to one and only one of the detected communities [5]. 
Parameter c can be given in advanced or determined by the detection method 
itself. 

The modularity, which measures the quality of a partition of a graph, was 
first introduced by Newman and Girvan m • This validity index measures how 
good a specific community structure is by calculating the difference between the 
actual edge density intra-clusters in the obtained partition and the expected one 
under some null models, such as random graph. One of the most popular form 
of modularity is given by [5. Given a partition with c group shown in Eq. ©, 
and let ||W|| = E"j=i ™ij, k i = E"=i w i its modularity can be defined as: 


(2) 
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where Sik is one if vertex i belongs to the kth community, 0 otherwise. 

The communities of graph G can be detected by modularity optimization, like 
spectral clustering algorithm [13] . which aims at finding the optimal partition 
with the maximum modularity value |3] . 


2.2 Belief function theory and evidential c-means 

The credal partition, a general extension of the crisp and fuzzy ones in the theo¬ 
retical framework of belief function theory, has been introduced in [2f7] . Suppose 
the discernment frame of the clusters is 42 as in Eq. ©■ Partial knowledge re¬ 
garding the actual cluster node n, belongs to can be represented by a basis belief 
assignment defined as a function to from the power set of 42 to [0,1], verifying 
= 1- Every A £ 2 n such that m(A) > 0 is called a focal element. 
The credibility and plausibility functions are defined in Eq. m and Eq. 0. 


Bel(A) = Y rn(B),VAC f2, (3) 

0/BCA 

Pl(A)= y m(B),VACf2. (4) 

snA/0 

Each quantity Bel(A) represents the degree to which the evidence supports 
A, while PI (A) can be interpreted as an upper bound on the degree of support 
that could be assigned to A if more specific information is available [12] . The 
function pi : 4? —> [0,1] such that pl(uj) = PI({oj}) is called the contour function 
associated to to. 

The bbas in the credal level can be expressed in the form of probabilities by 
pignistic transformation [5], which is defined as 


BetP{u>i) 


E 


m(A) 

1^1(1 -m(0))’ 


(5) 


where |t 4| is the number of elements of 42 in A. 

Evidential c-means (ECM) [7] is a direct generalization of FCM. The optimal 
credal partition is obtained by minimizing the following objective function: 

n n 

Iecm = Yj E I (6) 

i=l AjCO'Aj^ltl i=1 


constrained on 

Y mi(Aj) + TOj(0) = 1, (7) 

AjCn^Aj^Cl 

where rrii(Aj) is the bba of n* given to the nonempty set Aj, while TOi(0) is the 
bba of rii assigned to the emptyset. The value dij denotes the distance between rii 
and the barycenter associated to Aj , and • | is the cardinal of the set. Parameters 
a,/3,6 are adjustable and can be determined based on the requirement. 
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3 Evidential community detection 

Before presenting the credal partition of a graph G(V, E, W), the hard and fuzzy 
partitions are firstly recalled. The crisp partition can be represented by a matrix 
U h = (Uik)nxci where u^ k = 1 if the ith node n* belongs to the kth cluster w* 
in the partition, and u^ k = 0 otherwise. From the property of this partition, it 
clearly should satisfy that Y^k =i u ik = * = 1, 2, • • • ,n. The generalization of 

the hard partition, following that a node may belong to more communities than 
one but with different degrees, can be described by the fuzzy partition matrix 
U* = (uik) nxc , where u{ k is not restricted in {0,1} but can attain any real 
value from the interval [0,1]. The value u{ k could be interpreted as a degree of 
membership of rij to community uj k . 

The credal partition of G, which refers to the framework of belief func¬ 
tion theory, can be represented by a n-tuple: M = ■ ■ ■ ,m n ). Each 

rrij = {mn,mi 2 ,- ■ ■ , m^} is a bba in a 2 c -dimensional space, where c is the 
cardinality of the given discernment frame of communities 17 = {uji,u 2 , • • • , w c } 
as before, and uii denotes the ith detected community. Note that 17 is the dis¬ 
cernment frame in the framework of belief function theory. 

3.1 The evidential modular function 

Similar to the fuzzy modularity by Nepusz et al. [8] and by Havens et al. 
here we introduce an evidential modularity: 

I c n , , 

Q ’ = Pi § £ < ”'” “ m <8) 

where pit = {pln,pli 2 , ■ • • ,phc} is the contour function associated to rrii, which 
describes the upper value of our belief to the proposition that the ith node 
belongs to the kth community. 

Let k = (fci,fc 2 ,' ■' ,k n ) T , B = W - k T k/ ||IT||, and PL = (pl ik )nxc, then 
Eq. © can be rewritten as: 

f \ _ trac e{PL B PL T ) 

Qe ~ — m —■ (9) 

Q e is a directly extension of the crisp modularity function ©. When the credal 
partition degrades into the hard one, Q e is equal to Qh- 

3.2 Spectral mapping 

White and Smyth m showed that optimizing the modularity measure Q can be 
reformulated as a spectral relaxation problem and proposed spectral clustering 
algorithms that seek to maximize Q. By eigendecomposing a related matrix, 
these methods can map graph data points into Euclidean space, the clustering 
problem on which space is of equivalence to that on the original graph. 
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Let A = (aij) n xn be the adjacent matrix of the graph G. The adjacency 
matrix for a weighted graph is given by the matrix whose element a^- repre¬ 
sents the weight wij connecting nodes i and j. The degree matrix D = (da) 
is the diagonal matrix whose elements are the degrees of the nodes of G, i.e. 
da = Ylj =1 a ir The eigenvectors of the transition matrix A4 = D~ X A are used. 

Verma and Meila El and Zhang et al. m suggested to use the eigenvectors 
of a generalised eigensystem Ax = XDx, and pointed out that it is mathemati¬ 
cally equivalent and numerically more stable than computing the eigenvectors of 
matrix At [14] . To partition the nodes of the graph into c communities, the top 
c — 1 eigenvectors of the above eigensystem are used to map the graph data into 
points in the Euclidean space, where the traditional clustering methods, such as 
c-means (CM), FCM and ECM can be evoked. 

3.3 Evidential community detection scheme 

Let C be the upper bound of the number of communities. The evidential com¬ 
munity detection scheme is displayed as follows: 

5.1 Spectral mapping: 

For 2 < c < C, Find the top c generalized eigenvectors E c = [ei, e 2 , • ■ • , e c ] 
of the eigensystem Ax = A Dx, where A and D are the adjacent and the 
degree matrix respectively. 

5.2 Evidential c-means: 

For each value of c (2 < c < G), let E c = [e 2 , ■ • ■ , e c ]. Use ECM to partition 
the n samples (each row of E c is a sample data on the c — 1 dimensional 
Euclidean space) into c classes. And we can get a credal partition M for the 
graph. 

5.3 Choosing the number of communities: 

Find the suitable number of clusters and the corresponding evidential par¬ 
tition scheme by maximizing the evidential modular function Q e . 

In the algorithm, G can be determined by the original graph. It is an empirical 
range of the community number of the network. If c is given, we can get a credal 
partition using the proposed method and then the evidential modularity can 
be derived. The modularity is a function of c and it should peak around the 
optimum value of c for the given network. As in ECM, the number of parameters 
to be optimized is exponential in the number of communities and linear in the 
number of nodes. When the number of communities is large, we can reduce the 
complexity by considering only a subclass of bbas with a limited number of focal 
sets [7]. 

4 Experimental results 

To evaluate the proposed method in this paper, two real-world networks are 
discussed in this section. A comparison for the detected communities by credal, 
hard and fuzzy partitions is also illustrated to show the advantages of evidential 
community structure over others. 
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4.1 Zachary’s Karate Club 

The Zachary’s Karate Club m is an undirected graph which consists of 34 
vertices and 78 edges, describing the friendship between members of the club 
observed by Zachary in his two-year study. This club is visually divided into 
two parts, due to an incipient conflict between the president and instructor (see 
Fig. da). 

The modularity peaks around c = 2 or c = 3 as shown in Fig. |T]-a. Let 
c = 3, the detected communities by CM, FCM and ECM are displayed in Fig. [2j 
As it can be seen, a small community separated from ui i is detected by all 
the approaches. The result by FCM shown here is got by partitioning nodes to 
the cluster with the highest membership. Zhang et al. m suggested to use a 
threshold A to covert the fuzzy membership into the final community structure. 
For node i, let the fuzzy assignment to its communities be //y, j = 1, 2, • • ■ , c. 
Node i is regarded as a member of multiple communities uik with > A. But 
there is no criterion for determining the appropriate A. However, in ECM we 
can directly get the imprecise classes indicating our uncertainty on the actual 
cluster of some nodes by hard credal partitions [7]. 

As we can see in Fig.[2]-c, for ECM, node 1,9,10,12,31 belong to two clusters 
at the same time. This is coincident with the conclusion in PH apart from the 
fact that a significant high membership value is given to u> i for node 12 by 
FCM. Actually, the case that node 12 is clustered into W 12 — {oJi,ui 2 } seems 
reasonable when the special behavior of this node is considered. The person 
12 has no contact with others except the instructor (node 1). Therefore, the 
most probable class of node 12 should be the same as that of node 1. It is 
counterintuitive if the person 12 is partitioned into either wi or u> 2 , as it has no 
relation with any member in these two communities at all. The credal partition 
can reflect the fact that wi and W 2 is indistinguishable to node 12, while the 
fuzzy method could not. Furthermore, the mass belief assigned to imprecise 
classes reflects our degree of uncertainty on the clusters of the included nodes. 
As illustrated in Fig. [3}b, the mass given to imprecise clusters for node 1 is 
larger than that to the other four nodes. This reflects our uncertain on node l’s 
community is largest. As node 1 is the instructor of the club, this fact seems 
reasonable. 

Actually, the concept of credal partitions suggests different ways of summa¬ 
rizing data. For example, the data can be analysed in the form of fuzzy partition 
thanks to the pignistic probability transformation shown in Eq. (0. It is shown 
in Fig.[3}a pignistic probabilities play the same role as fuzzy membership. A crisp 
partition can then be easily obtained by partitioning each node to the commu¬ 
nity with the highest pignistic probability. In this sense, the proposed method 
could be regarded as a general model of hard and fuzzy community detection 
approaches. 


Modularity 
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a. Karate Club network b. American football network 

Fig. 1 . Modularity values with community numbers. 





Fig. 2. Karate Club. 
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Y/////////A 
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Mass belief 
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a. Probability membership b. Mass belief 

Fig. 3. Clustering results of Karate club network. 


4.2 American football network 


The network we investigate in this experiment is the world of American college 
football games between Division IA colleges during regular season Fall 2000 [?] . 
The vertices in the network represent 115 teams, while the links denote 613 
regular-season games between the two teams they connect. The teams are divided 
into 12 conferences containing around 8-12 teams each and generally games are 
more frequent between members from the same conference than between those 
from different conferences. 

In ECM, the number of parameters to be optimized is exponential in the 
number of clusters |7j. For the number of class larger than 10, calculations are 
not tractable. But we can consider only a subclass with a limited number of focal 
sets [7]. In this example, we constrain the focal sets to be composed of at most two 
classes (except 17). Fig.|T|-b shows how the modularity varies with the number of 
communities. For credal partitions, the peak is at c = 10. This is consensus with 
the original network (shown in Fig.[4}a) composed of 10 large communities (more 
than 8 members) and 2 small communities (8 members or less than 8 members). 
Set c = 10 in ECM, we can find all the ten large communities, eight of which 
are exactly detected. For the nodes in small communities, ECM partitions most 
of them into imprecise classes. As there are more than 10 communities in this 
network, we use u>i+j to denote the imprecise communities instead of Wy in the 
figures related to this experiment to obviate misunderstanding. 

For hard partitions, nodes in small communities are simply partitioned into 
their “closest” detected cluster, which will certainly result in a loss of accuracy 
for the final results. Credal partitions make cautious decisions by clustering nodes 
which we are uncertain into imprecise communities. The introduced imprecise 
clusters can avoid the risk to group a node into a specific class without strong 
belief. In other words, a data pair can be clustered into the same specific group 
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only when we are quite confident and thus the misclassification rate will be 
reduced. 




a. Original network 


c. CM 




b. ECM 


d. FCM 


Fig. 4. American football network. 


5 Conclusion 

In this paper, a new community detection approach combing the evidential mod¬ 
ularity, spectral mapping and evidential c-means is presented to identify the 
overlapping graph structure in complex networks. Although many overlapping 
community-detection algorithms have been developed before, most of them are 
based on fuzzy partitions. Credal partitions, in the frame of belief function the¬ 
ory, have many advantages compared with fuzzy ones and enable us to have 
a better insight into the data structure. As shown in the experimental results 
for two networks in the real world, credal partitions can reflect our degree of 
uncertain more intuitively. Actually, the credal partition is an extension of both 
hard and fuzzy ones, thus there is no doubt that more rich information of the 
graph structure could be available from the detected structure by the method 













10 


Kuang Zhou et al. 


proposed here. We expect that the evidential clustering approaches will be em¬ 
ployed with promising results in the detection of overlapping communities in 
complex networks with practical significance. 
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